DD2 biobank

Primary data derived from DD2 research studies

Go to Data documentation

When individuals are enrolled in DD2, blood and urine samples are collected and stored in the biobank in Vejle. The samples themselves are considered “primary DD2 enrollment data”, and they are all collected at DD2 baseline, i.e., the time of the DD2 enrollment defined by the variable reg_dato. No automatic or standard analyses are conducted, but DD2 research projects can have the samples analyzed if (additional) analyses are needed. Irrespective of when the analyses are performed, the timing for the analysis results will always be baseline DD2, because that was the time the blood/urine sample was collected. The results from analyses of the blood and urine samples are considered “Additional DD2 data”.

For further information about the initial idea about the biobank, please see Christensen et al. (2012).
The documentation for the biobank can be downloaded here (Danish, downloaded 12 January 2024):

Data sources

The data in the biobank are joined from multiple studies and datasets. The individual identifier in the biobank is the ProjektID and it is unique per CPR number. In general, the variable ProjektID ends with the digits -00. Another variable is Barcode which is similar to ProjektID, but it end with a number (e.g. -12, -19, or -99) and denotes the specific sample for the individual (see documentation above).

Locally at Department of Clinical Epidemiology (DCE), the raw data files are stored in the folder:
O:\HE_KEA-DATA-RAW0050\DD2 data\Main Part - Local DD2 database\data\Input Data Sets\BioBank

Data files from the 2022/2023 data updates are stored in the folder:
O:\HE_KEA-DATA-RAW0050\DD2 data\Biomarkører

Below is a list of the biobank data files available at DCE. File names might have changed since DCE initially received them.

First biomarkers

In the early phase of DD2, many biomarkers were analysed for the first 1,053 individuals enrolled in DD2. These individuals are marked by the variable BlodProve1053patients (based on the CPR numbers in data files First1053Patients and UrinResultatermedRatio2013Nov) and are predominantly enrolled in 2011-2012. Some of the individuals have later withdrawn consent and are no longer in the DD2 cohort.

  • First1053Patients.txt: The file appears to have been received in November 2012. It includes N=1,053 CPR numbers and these individuals are most likely everyone in the database at that time. The file includes results from the initial blood samples. The dataset lacks date information and also unit specifications for the variables hæmolyse, icteri, and lipæmi. The dataset includes the variables:

    • CPR
    • ProjektID
    • C-peptid (N=1,053, pmol/L)
    • GAD (N=1,053, kU/L)
    • Glucose (N=1,050, mmol/L)
    • ALAT (N=1,030, U/L)
    • Hæmolyse (N=1,030)
    • Icteri (N=1,030)
    • Lipæmi (N=1,030)
    • AMYLP (N=1,041, U/L)
    • CRP (N=1,041, mg/L)

It is recommended not to use c-peptide measurements from this file, since the samples are analysed using an old type of analysis kit (see below).

Information about some of the variables can be found in Mor et al. (2014).

  • UrinResultatermedRatio2013Nov.txt: DCE likely received the data file in December 2013. It includes N=1,053 CPR numbers (same as in First1053Patients) with results from the initial analyses of the urine samples. The dataset includes the variables:

    • CPR
    • Barkode (ends with -19)
    • ALBu_mg_L_ (N=1,041, mg/L, dated November 19, 20, and 23, 2012)
    • KREA_mmol_L_ (N=1,041, mmol/L, dated November 19, 20, and 23, 2012)
    • UPROT_g_L_ (N=1,041, g/L, dated November 19, 20, and 23, 2012)
    • Albumin_Kreatinin_ratio (N=1,041)
    • Kommentar (N=473, <5 with “PROT-U > 2.5 g/l” and the rest with “ALB-U <3 mg/L”)

Additional biomarkers

  • DataTilDD2medFastende.txt: DCE probably received the file in September 2015. It includes N=5,996 CPR numbers with character results on GAD, glucose, and c-peptide, along with a variable about fasting status. There are no dates in the data. The majority of the individuals in the file are enrolled before 2015, but it is only around 80-85% of all the individuals enrolled before 2016. The dataset includes the variables:

    • Id (ProjectID, ends with -00)
    • GAD (N=131 numeric values, N=5,797 with the value “<0,000”, and N=51 with “>525,000”)
    • Glucose (N=4,457)
    • Cpeptid (N=5,964)
    • Fastende (“Ja” for N=2,891, “Nej” for N=397, and “Ved ikke” for the remaining N=2,708) (see below for more information about fasting blood samples)

The majority of the initial 1,053 individuals are also included in this data file. It was initially recommended to use the c-peptide measurements in this data file and not the original data file, First1053Patients.txt. Later, in 2023, DCE received new data with fully updated c-peptide and glucose values (see below).

  • WrongCPeptideMeasurements.txt: The file appears to have been received in November 2015. It includes N=105 ID numbers (end with -00) and the variables cpeptid and NyCpeptid. The analysis was performed to compare values from different analysis kits, and the values were quite different. It is recommended not to use any of the measurements from this file.

MBL data

DCE received variables and data regarding mannose-binding lectin (MBL) in September 2017 as part of the PhD project Gedebjerg (2020). See also DD2 project description.

  • DD2 resultater.xlsx (and DD2 resultater_10_0095, which is the version where <10 is replaced by 10 and <0,095 by 0,095): DCE received the file in September 2017. It includes N=7,519 barcodes (end with -99) in sheets of 100 or 101 rows, and should include CRP and MBL on everyone enrolled by December 2016. See Gedebjerg et al. (2023) for more information. The dataset includes the variables:

    • barkode (ends with -99)
    • CRP (N=7,510, mg/L)
    • MBL (N=7,514, µg/L)

DCE was told to keep the CRP measurements from the original data file (First1053Patients.txt) and this file separate. The unit for CRP is mg/L in both data files.

  • Resultater den 250917 Anne Gedebjerg.xlsx: DCE received the file in September 2017. It includes N=3,116 barcodes (end with -99) and variables regarding MBL expression genotyping (six SNPs in the MBL2 gene). The genotyping was done for the first ~3,000 individuals enrolled in DD2. See Gedebjerg et al. (2020) for more information. The dataset includes the variables:

    • barkode (ends with -99)
    • HL
    • XY
    • PQ
    • 52
    • 54
    • 57
    • HAPLOTYPE

April 2022 data

During 2022-2023 DCE received additional data on CRP, c-peptide, and glucose. A file with CRP was received in October 2022, but it is fully included in a file from January 2023 which also includes c-peptide and glucose. The October file is therefore not used during uploads, whereas the January 2023 file has been uploaded to the servers.

  • DD2_cRP_Glucose_Cpep_2022_resultater (1).xlsx: The data file includes N=3,399 ProjektIDs. They are all enrolled after the first N=1,053 individuals, but it is not all individuals per year. We don’t know exactly why they were analysed, but it might have been part of the IDA study. The dataset includes the variables: projekt_id, Cpeptid_Barkode, Cpeptid_Resultat, Cpeptid_Måleenhed, Cpeptid_Antal_decimaler, Cpeptid_Dato, Cpeptid_Notat, CRPHS_Barkode, CRPHS_Resultat, CRPHS_Måleenhed, CRPHS_Antal_decimaler, CRPHS_Dato, CRPHS_Notat, Glukose_Barkode, Glukose_Resultat, Glukose_Måleenhed, Glukose_Antal_decimaler, Glukose_Dato, Glukose_Notat.

    • projekt_id (N=3,339)
    • c-peptid (N=2,933, pmol/L, dated 30APR2022 or 01MAY2022)
    • CRP (N=2,478, mg/L, dated 30APR2022 or 01MAY2022)
    • Glucose (N=3,055, mmol/L, dated 02APR2022 or 03APR2022)

C-peptide and glucose

During the summer 2023 DCE received data on c-peptide (July 2023) and glucose (August 2023). These files include all cleaned c-peptide and glucose measurements from the biobank, and results from these files will thus replace all the other measurements from earlier datasets. We now have the files:

  • dd2_all_C_peptide_14July2023.xls: Includes N=9,762 ProjektIDs with data on c-peptide. The file also includes information about analysis date, sampling date, freezing date, unit, and kit. The following data management has been done by DD2 before DCE received the file:
Quote from e-mail
- Alle 1254 observationer, som havde"gamle" målinger (før 28. feb 2015) er erstattet med opdaterede målinger med nyt kit/assay (efter 28. feb 2015).
- 6 observationer, som kun havde en "gammel" måling og INGEN opdateret genmåling er slettet
- Der er renset op i data, så hvert individ kun fremgår med én måling, som er analyseret vha. nyt kit/assay 

Please note that the unit for c-peptide has changed from pmol/L to mmol/L.

  • DD2_glukoser_2023_08_23.xlsx: The data file includes N=9,563 projekt_id and information about glucose, units, and dates.

HOMA

HOMA values are calculated based on c-peptide and glucose. We use the Oxford calculator which can be downloaded from the website: HOMA calculator. Since summer 2024, access to the HOMA calculator requires a licence (free of charge for “academic researchers”).

HOMA values in the data are calculated based on the c-peptide and glucose measurements received during the summer 2023. HOMA has only been estimated for glucose values in the interval 3.0-25 and c-peptide 0.2-3.5 (because of an updated Oxford HOMA calculator where values out of range cause problems in the excel calculator). Some individuals might therefore have glucose and c-peptide values but no HOMA values in the new data. HOMA values are estimated regardless og fasting status.

“Pladebiomarkører”

During 2022-2023, DCE received new data from additional biomarker analyses. Because of the way the analyses were performed, the new biomarkers are referred to as “pladebiomarkører”, as opposed to the previous ones which are called “målebiomarkører”. In practice, everyone enrolled as of the day the blood samples were taken from the biobank were included in the analysis. This was in the beginning of 2022, and include approximately the first 9,200 individuals. A small number of individuals have multiple measurements for specific biomarkers, most likely due to sample dilution during the analysis process. Data are in long format and include a total of 22 different biomarkers, all with unit pg/ml. The 22 biomarkers are listed here, and the dates refer to when DCE received the data files:

  • TNF-a (April 2022, N=9,202)
  • IL-6 (April 2022, N=9,195)
  • Ang-Like4 (November 2022, N=9,200)
  • FGF-21 (November 2022, N=9,200)
  • FGF-23 (November 2022, Hu FGF-23, N=9,200)
  • IL1-RA (November 2022, N=9,200)
  • Leptin (November 2022, N=9,196)
  • RAGE (November 2022, soluble, N=9,200)
  • Sclerostin (November 2022, N=9,200)
  • U-PAR (November 2022, N=9,200)
  • Osteocalcin-1 (February 2023, N=9,203)
  • CD163 (April 2023, N=9,047)
  • Galectin-3 (April 2023, N=9,008)
  • GDF-15 (April 2023, N=9,046)
  • NT-proBNP (April 2023, N=9,048)
  • Resistin (April 2023, N=9,046)
  • Serpin (April 2023, N=9,047)
  • YKL-40 (April 2023, N=9,045)
  • Osteopontin (June 2023, N=9,204)
  • Adiponectin (July 2023, N=9,204)
  • Follistatin (July 2023, N=9,204)
  • MPO (July 2023, N=9,204)

An overview of the biomarkers (table from the grant application) can be found here:

An additional document combining overview sheets, method descriptions, and quality logs from some of the analysis rounds can be downloaded here:

The data files with “pladebiomarkører” and “målebiomarkører” are not using the same format (e.g., long vs. wide format) and are therefore not combined.

Data files (pladebiomarkører)

This section is an overview of the data files DCE received.

  • April 2022, 220211 Vplex_final.xlsx with 2 sheets (data and background information). Data were received in April 2022 but the analysis was probably performed in February 2022 based on the date stamps in file names. The data file includes data from N=9,294 individuals on IL-6 and TNF-a. The first sheet, Vplex sample results_final, includes the variables: Sample (id, ends with -12), Sample_Group (=Sample in all rows), Assay (either TNF-a or IL-6), Calc__Conc__Mean (results), RANGE (value 1 or 2), Plate_Name (each Plate_Name is used 78 times).

    • TNF-a (N=9,202, pg/ml)
    • IL-6 (N=9,195, pg/ml)

    The second sheet, Vplex complete final, includes background information (rådata) about the sample from the sample_groups Sample (N=9,024), Standards (N=1,888), and Internal Control (N=236). The sheet includes the variables Plate_Name, Sample_Group, Sample, Assay, Well, Signal, Mean, CV Calc__Concentration, Calc__Conc__Mean, Calc__Conc__CV, __Recovery, __Recovery_Mean, Detection_Limits__Calc__Low, Detection_Limits__Calc__High, Detection_Range, Detection_Range_yesno, Quantification_range, Quantification_range_yesno, RANGE.

    DCE also received the data file DD2 quality log_panel 1 edited_TNF IL6.xlsx but is has not been used.

  • November 2022, 8plex data final.xlsx, with 9 sheets (overview + 8 biomarkers). DCE received the data file in November 2022. There are no date stamps indicating when the analyses were performed. The data file include information on N=9,204 individuals (based on sample ID ending with -12). For each assay, the data file includes the variables sample (ID), assay, calc__conc__mean (result), RANGE, and plate_name. DCE was informed that the unit is pg/ml for all assays.

    • Ang-Like4 (N=9,200, pg/ml)
    • FGF-21 (N=9,200, pg/ml)
    • FGF-23 (Hu FGF-23, N=9,200, pg/ml)
    • IL1-RA (N=9,200, pg/ml)
    • Leptin (N=9,196, pg/ml)
    • RAGE (soluble, N=9,200, pg/ml)
    • Sclerostin (N=9,200, pg/ml)
    • U-PAR (N=9,200, pg/ml)

    DCE also received data files eight biomarkers with CPR.xlsx and seven biomarkers with CPR.xls but these have not been used.

  • February 2023, DD2 osteocalcin final.xlsx with 3 sheets (overview, results, and rådata). DCE received the file in February 2023, but there is no indication of when the analyses were performed. The file includes N=9,204 individuals (sample, ends with -12). The data file includes the variables sample (ID), assay, calc__conc__mean (result), RANGE, and plate_name.

    • Osteocalcin-1 (N=9,203, pg/ml)

    The sheet rådata includes detailed information about each of the plates.

  • April 2023, DD2_7plex_data_final.xlsx with 9 sheets (overview + 7 biomarkers + additional sheet with sample names). DCE received the data file in April 2023. There are no date stamps indicating when the analyses were performed. The data file includes information on N=9,048 individuals (based on sample ID ending with -12). For each assay, the data file includes the variables sample (ID), assay, calc_conc_mean (result), RANGE, and plate_name. DCE was informed that the unit is pg/ml for all assays.

    • CD163 (N=9,047, pg/ml)
    • Galectin-3 (N=9,008, pg/ml)
    • GDF-15 (N=9,046, pg/ml)
    • NT-proBNP (N=9,048, pg/ml)
    • Resistin (N=9,046, pg/ml)
    • Serpin (N=9,047, pg/ml)
    • YKL-40 (N=9,045, pg/ml)
  • June 2023, Osteopontin data final (1).xlsx with 3 sheets (overview + data + rådata). DCE received the file in June 2023, but there is no indication of when the analyses were performed. It includes N=9,207 ID numbers (end with -12, plus a note that it means “EDTA plasma fraction”) with data on osteopontin.

    • Osteopontin (N=9,204, pg/ml)
  • July 2023 (1), DD2_adiponectin_blue panel_final (1).xlsx with 3 sheets (overview + data + rådata). DCE received the file in July 2023, but there is no indication of when the analyses were performed. It includes N=9,204 ID numbers (sample, end with -12) with data on adiponectin.

    • Adiponectin (N=9,204, pg/ml)

    The sheet rådata includes detailed information about each of the plates.

  • July 2023 (2), DD2 red panel_final.xlsx with 4 sheets (overview + data (Follistatin + MPO) + rådata). DCE received the file in July 2023, but there is no indication of when the analyses were performed. It includes N=9,204 ID numbers (sample, end with -12) with data on follistatin and MPO

    • Follistatin (N=9,204, pg/ml)
    • MPO (N=9,204, pg/ml)

    The sheet rådata includes detailed information about each of the plates.

Fasting

Was the individual fasting when the blood sample was drawn? A simple question, yet, difficult to assess.

Upon enrollment, the individuals are informed to be fasting: no food/liquid (except water) from 10.00 o’clock the night before. Also, while fasting, the patient should not take any glucose-regulating drugs. Data from the DD2 questionnaire itself include the variable Er_patienten_fastende_ (whether the patient is fasting). Currently, around 75% of the individuals have answered that they are fasting. This variable can be used on its own, but should probably be combined with the variable Tages_der_blodproeve_i_forbindel (whether the blood sample was taken at the same time as the questionnaire was answered). If the fasting patients are restricted to include only those whose blood sample was drawn at the same time as the questionnaire was answered, then around 72% of the individuals are defined to be fasting.

The data file DataTilDD2medFastende.txt received in September 2015 include information on fasting state for N=5,996 individuals (“Ja” for N=2,891, “Nej” for N=397, and “Ved ikke” for N=2,708 individuals). It is not known how this variable was defined, but it is probably based on information on the blood sample itself. By adding the information from this file (variable NewFastende) to the variable Er_patienten_fastende_, an additional 447 individuals can be defined as fasting (410 with missing information in Er_patienten_fastende_ and 37 who replied not to be fasting in Er_patienten_fastende_). However, the data file will not be updated.

Currently, the fasting state has been defined by the following (SAS) algorithm combining all the files and stating that the individual is fasting if there is any indication that this could be the case (macro: AdditionalVars_Faste):

    if Tages_der_blodproeve_i_forbindel in: ('Ja') then do;
        if Er_patienten_fastende_=:'Ja' or NewFastende=:'Ja' then Faste='Ja';
        else if Er_patienten_fastende_=:'Nej' or NewFastende=:'Nej' then Faste='Nej';
    end;
    else if Tages_der_blodproeve_i_forbindel in: ('Nej') then do;
        if NewFastende=:'Ja' then Faste='Ja';
        else if NewFastende=:'Nej' then Faste='Nej';
    end;
    else if Tages_der_blodproeve_i_forbindel in (' ') then do;
        if Er_patienten_fastende_=:'Ja' or NewFastende=:'Ja' then Faste='Ja';
        else if  Er_patienten_fastende_=:'Nej' or NewFastende=:'Nej' then Faste='Nej'; 
    end;

Note: Steno Diabetes Center Odense plan to make an “official” definition of fasting status.


Data documentation

biobank.sas7bdat

Format (var x obs) Id variables Unique key Important dates
Wide (48 x 11,381) CPR, ProjektID CPR (ProjektID) VejleDato

All datafiles except the ones including the 22 variables from the “pladebiomarkører” are combined and included in the biobank dataset. A few variables from dd2core (e.g. reg_dato and Er_patienten_fastende_) are also included in the biobank dataset.

Data include analysis results from successful analyses. Not all analyses are performed for all individuals (missing data), and there is no additional information about specific analyses (i.e., project, analysis method, unit/kit, non-successful analyses etc.). In some versions of the dataset, rows are included for all CPR numbers in the population, even if no analysis results are available.

Illustration of the overall data structure. The dataset is in wide format (48 x 11,381), with CPR or ProjektID as the unique key.
Row CPR ProjektID Analysis1 Analysis2 Analysis3
1 CPR1 ProjektID1 num. num.
2 CPR2 ProjektID2 num. num.
3 CPR3 ProjektID3
4 CPR4 ProjektID4 num. num. num.
12,098 CPR12098 ProjektID12098 num. num. num.

biomark.sas7bdat

Format (var x obs) Id variables Unique key Important dates
Long (9 x 201,243) CPR, ProjektID, ydernr CPR*Assay (Vejledato)

The biomarkdataset include analysis results from the 22 “pladebiomarkører”. No dates are included in the dataset, however, analyses are performed on the enrollment blood sample. The dataset include approximately 9,200*22=202,400 rows. In principle, CPR*Assay should be the unique key, however, some analyses are performed multiple times per individual (due to dilution in the analysis).

Illustration of the overall data structure. The dataset is in long format (9 x 201,243), with CPR*Assay as the unique key.
Row CPR ProjektID Assay Value Info
1 CPR1 ProjektID1 TNF-a num.
2 CPR1 ProjektID1 IL-6 num.
3 CPR1 ProjektID1 Ang-Like4 num.
22 CPR1 ProjektID1 MPO num.
23 CPR2 ProjektID2 TNF-a num.
24 CPR2 ProjektID2 IL-6 num.

References

Christensen H, Nielsen JS, Sørensen KM, Melbye M, Brandslund I. New national biobank of the danish center for strategic research on type 2 diabetes (DD2). Clin Epidemiol. 2012;4:37–42.
Gedebjerg A. Complications of type 2 diabetes prevalence and association with mannose-binding lectin. Aarhus Universitet, Department of Clinical Epidemiology; 2020. (PhD thesis).
Gedebjerg A, Bjerre M, Kjaergaard AD, Nielsen JS, Rungby J, Brandslund I, et al. CRP, c-peptide, and risk of first-time cardiovascular events and mortality in early type 2 diabetes: A danish cohort study. Diabetes Care. 2023;46(5):1037–45.
Gedebjerg A, Bjerre M, Kjaergaard AD, Steffensen R, Nielsen JS, Rungby J, et al. Mannose-binding lectin and risk of cardiovascular events and mortality in type 2 diabetes: A danish cohort study. Diabetes Care. 2020;43(9):2190–8.
Mor A, Svensson E, Rungby J, Ulrichsen SP, Berencsi K, Nielsen JS, et al. Modifiable clinical and lifestyle factors are associated with elevated alanine aminotransferase levels in newly diagnosed type 2 diabetes patients: Results from the nationwide DD2 study. Diabetes Metab Res Rev. 2014;30(8):707–15.